BUG/TST: Fixes isnull behavior on NaT in array. Closes #5443 #5524

commonlisp · 2013-11-15T16:31:06Z

I added a test case test_isnull_nat() to test_common.py and a check for NaT in lib.isnullobj. pd.isnull(np.array([pd.NaT])) now yields the correct results ([True]).

closes #5443

jreback · 2013-11-15T16:48:29Z

this needs to be profiled via test_perf.sh

commonlisp · 2013-11-15T17:57:27Z

I ran test_perf.sh on the patch. Here is the vb_log: https://gist.github.com/commonlisp/7488695

jreback · 2013-11-15T17:59:52Z

pandas/lib.pyx

@@ -213,7 +213,7 @@ def isnullobj(ndarray[object] arr):
    n = len(arr)
    result = np.zeros(n, dtype=np.uint8)
    for i from 0 <= i < n:
-        result[i] = util._checknull(arr[i])
+        result[i] = util._checknull(arr[i]) or arr[i] is NaT


this should be in util._checknull itself

commonlisp · 2013-11-15T20:07:28Z

Ah, yes, I did have the check in util._checknull at first, but this creates a circular dependency between util and tslib where NaT is defined. Is that ok? Or perhaps NaT should be factored out.

jreback · 2013-11-15T20:21:18Z

actually...this whole change is trivial....just change _checknull to checknull in isnullobj; this already checks for NaT. and then profile again (it prob will be fine because isnullobj is only called if its object type in the first place)

commonlisp · 2013-11-15T23:16:49Z

Great call. Using checknull now and added the new explicit test case. Here is a fresh test_perf.sh: https://gist.github.com/commonlisp/7492647. Going to run it again just to make sure there aren't any aberrations.

commonlisp · 2013-11-19T15:39:14Z

I added a gist https://gist.github.com/commonlisp/7547232 with another run of test_perf.sh. It appears to agree with the previous run.

commonlisp · 2013-11-21T15:03:45Z

The test_perf.sh runs for modify isnullobj directly and using checknull do differ somewhat (1.22 vs. 1.87). The question is whether this is within test_perf.sh's margin of error.

jreback · 2013-11-21T15:09:05Z

that's definitly out of bounds, pls see if its something simple..

commonlisp · 2013-12-15T06:45:05Z

I've run a good number of performance tests. It seems that util.is_float_object, util.is_complex_object, util.is_datetime64_object, and util.is_timedelta64_object, though declared inline functions in numpy_helper.h, are contributing considerably to the runtime (~1.26-1.5x individually). We could factor out yet another version of lib.checknull that skips those checks to support the lib.isnullobj case.

jreback · 2013-12-15T14:28:26Z

ok I guess having a special version of checknull_object might be useful for object arrays where u can skip certain checks (but still need some that _checknull does not do)

cancan101 · 2013-12-16T05:53:08Z

Any reason not to get this merged in for v0.13?

jreback · 2013-12-16T13:43:56Z

@cancan101 have to nail down the perf issue, this code is used everywhere

cancan101 · 2013-12-16T14:11:18Z

That makes sense. I wasn't sure if a perf fix was worth waiting for.

commonlisp · 2013-12-18T04:25:51Z

New test_perf.sh for this factoring of lib.checknull: https://gist.github.com/anonymous/8017235

jreback · 2013-12-18T15:09:39Z

pandas/lib.pyx

@@ -170,20 +170,23 @@ cdef inline int64_t get_timedelta64_value(val):
 cdef double INF = <double> np.inf
 cdef double NEGINF = -INF

+cpdef checknull_NaT(object val):


this doesn't need to be cpdef (only cdef)

make it _checknull_NaT as well
make it inline and I don't think you need util. when calling _checknull

btw - sometimes it can actually be faster to manually inline the function
(via copy/paste) vs. using the inline keyword. It's strange but if you
can't pull out more perf, might be helpful.

@jtratner is right...just put it in 2 places is ok (but need to avoid the arr[i] twice...

@commonlisp I know this is tedious..but this type of low-level coding affects so many things....

commonlisp · 2014-01-02T16:34:58Z

I think this is ready for review again. Please let me know if anyone has feedback.

jreback · 2014-01-02T18:12:10Z

can you squash the commits: https://github.com/pydata/pandas/wiki/Using-Git

and post a perf run...just to check

thanks

jreback · 2014-01-03T18:07:47Z

checkout this: https://github.com/pydata/pandas/wiki/Using-Git

you don't want to merge in other commits (nor merges)

so git rebase -i origin/master...then delete everything but your commit

then

git push myfork thisbranchname -f

commonlisp · 2014-01-06T21:31:28Z

Commits squashed, not merging any other commits
Recent test_perf.sh https://gist.github.com/commonlisp/8290153

jreback · 2014-01-06T21:33:55Z

you need to take out all but your commit

jreback · 2014-01-06T22:55:48Z

do this:

git commit -C HEAD --amend

then force push

this will make it build again

then'll I merge it....

common._isnull_ndarraylike(...) uses lib.isnullobj to check nulls/NaN/NaT in ndarray, which in turn relies on util._checknull. _checknull did not know about NaT, but now lib.isnullobj does, while still maintaining performance by doing arr[i] only once. Added a test case test_isnull_nat() to test_common.py and check for NaT in lib.isnullobj. pd.isnull(np.array([pd.NaT])) now yields the correct results ([True]).

jreback · 2014-01-06T23:46:32Z

@commonlisp
thanks!

first time working thru the process is painful......next time should be easier!

commonlisp · 2014-01-06T23:48:28Z

@jreback Thank you!

jreback reviewed Nov 15, 2013
View reviewed changes

jreback reviewed Dec 18, 2013
View reviewed changes

jreback merged commit f216c74 into pandas-dev:master Jan 6, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG/TST: Fixes isnull behavior on NaT in array. Closes #5443 #5524

BUG/TST: Fixes isnull behavior on NaT in array. Closes #5443 #5524

commonlisp commented Nov 15, 2013

jreback commented Nov 15, 2013

commonlisp commented Nov 15, 2013

jreback Nov 15, 2013

commonlisp commented Nov 15, 2013

jreback commented Nov 15, 2013

commonlisp commented Nov 15, 2013

commonlisp commented Nov 19, 2013

commonlisp commented Nov 21, 2013

jreback commented Nov 21, 2013

commonlisp commented Dec 15, 2013

jreback commented Dec 15, 2013

cancan101 commented Dec 16, 2013

jreback commented Dec 16, 2013

cancan101 commented Dec 16, 2013

commonlisp commented Dec 18, 2013

jreback Dec 18, 2013

jtratner Dec 18, 2013

jreback Dec 18, 2013

commonlisp commented Jan 2, 2014

jreback commented Jan 2, 2014

jreback commented Jan 3, 2014

commonlisp commented Jan 6, 2014

jreback commented Jan 6, 2014

jreback commented Jan 6, 2014

jreback commented Jan 6, 2014

commonlisp commented Jan 6, 2014

BUG/TST: Fixes isnull behavior on NaT in array. Closes #5443 #5524

BUG/TST: Fixes isnull behavior on NaT in array. Closes #5443 #5524

Conversation

commonlisp commented Nov 15, 2013

jreback commented Nov 15, 2013

commonlisp commented Nov 15, 2013

jreback Nov 15, 2013

Choose a reason for hiding this comment

commonlisp commented Nov 15, 2013

jreback commented Nov 15, 2013

commonlisp commented Nov 15, 2013

commonlisp commented Nov 19, 2013

commonlisp commented Nov 21, 2013

jreback commented Nov 21, 2013

commonlisp commented Dec 15, 2013

jreback commented Dec 15, 2013

cancan101 commented Dec 16, 2013

jreback commented Dec 16, 2013

cancan101 commented Dec 16, 2013

commonlisp commented Dec 18, 2013

jreback Dec 18, 2013

Choose a reason for hiding this comment

jtratner Dec 18, 2013

Choose a reason for hiding this comment

jreback Dec 18, 2013

Choose a reason for hiding this comment

commonlisp commented Jan 2, 2014

jreback commented Jan 2, 2014

jreback commented Jan 3, 2014

commonlisp commented Jan 6, 2014

jreback commented Jan 6, 2014

jreback commented Jan 6, 2014

jreback commented Jan 6, 2014

commonlisp commented Jan 6, 2014